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The paper presents the prediction of the ultimate bearing 
capacity of the strip footing resting on layered soil (dense sand 
overlying loose sand) using random forest regression (RFR). In 
this study, 181 data collected from literature were used. 71 % of 
the total data was randomly selected for training the model and 
the rest of the data were utilized for the testing purpose. The 
various input parameters were friction angle of the dense sand 
layer (¢,), friction angle of the loose sand layer (¢), unit weight 
of the dense sand layer (™), unit weight of the loose sand layer 
(%), ratio of the thickness of the dense sand layer below base of 
the footing to the width of footing (H/B), ratio of the depth of 
the footing to the width of the footing (D/B) and (H+D)/B. 
Ultimate bearing capacity was the output in this — study. 
Performance measures were used in order to make _ the 
comparison with the artificial neural network (ANN) and M5P 
model tree. The result of this study revealed that the 
performance of the RFR was superior to MSP and ANN. The 
results of the sensitivity analysis reveals that the unit weight and 
the friction angle of the loose sand layer were the most 
important parameters affecting the output ultimate bearing 
capacity of the strip footing resting on the layered soils. 
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1. Introduction 


In the foundation design, it is desired that the load of superstructure be transferred to the soil 
beneath the foundation safely without causing shear failure and excessive settlement. Many 
studies to determine the ultimate bearing capacity (UBC) of the footing resting over 
homogeneous soil were available in literature. But in actual field situation, the soil encountered 
was a layered soil. Various analytical and experimental methods were used to determine the UBC 
in such cases. Other approaches to determine the UBC of the footing resting over layered soil 
were classical approach [1—10], Semi empirical approach [1,3—5,10], Kinematic approach [2,8], 
Numerical approach [6], Finite element method [11—16]. Recently, researchers were focusing on 
the application of soft computing techniques such artificial neural network (ANN), support vector 
machine (SVM), random forest regression (RFR) and M5 model trees (M35P) in geotechnical 
engineering. Many studies related to the prediction of bearing capacity and settlement of the 
footings in different medium [17—20], deviator stress [21] bearing capacity of the strip footing 
resting on multilayered soil [22], geotechnical parameters [23], ultimate bearing capacity of the 
skirted and square footing on sand and confined sand [24,25], settlement of footings on 
cohesionless soils [26], horizontal stress [27], unsoaked and soaked bearing ratio [28] using 
ANN were available in literature. Studies related to prediction of pile capacity [29], settlement of 
footings on cohesionless soils [30], soil water content [31], soil classification and soil properties 
[32], soil moisture from remote sensing data [33] using SVM were available in literature. Very 
recently studies related to the prediction of pier scour [34] infiltration rate of soil [35] and 
geotechnical parameters [23] were reported using RFR and M5P in literature. These studies have 
concluded that the ANN, SVM, RFR and MSP satisfactorily able to model the geotechnical 
engineering problems. However, no study was available in literature to predict the ultimate 
bearing capacity of the strip footing resting on layered soil (dense sand overlying loose sand) 
using RFR, M5P and ANN in literature. This study tries to fill this gap. In the present paper 
application of RFR andMS5P were used to predict the UBC of strip footing resting on layered soil 
(dense sand overlying loose sand). Finally, the performance of these two techniques was 
compared with the widely used ANN technique in geotechnical engineering. 


2. Problem statement 


The problem statement for the footing resting on layered soil to predict the UBC is shown in Fig. 
1. The various input parameters affecting the ultimate bearing capacity (q,,) of the footing resting 
on layered soil were collected from the experimental and the finite element modeling results as 
reported in [4,36,37] and were given below. 


. Friction angle of the dense sand layer (@) 

. Friction angle of the loose sand layer (@) 

. Unit weight of the dense sand layer () 

. Unit weight of the loose sand layer (72) 

. Ratio of the thickness of the dense sand layer below base of the footing to the width of 
footing (H/B) 

. Ratio of the depth of the footing to the width of the footing (D/B) 
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Fig. 1. Strip footing resting on layered soil. 
3. Soft computing approaches 


3.1. Random forest regression 


The RFR is basically a regression and classification technique. This technique uses a 
combination of tree predictors. In this technique, each of the trees was generated using a random 
vector which was independently picked up from the input vector. Tree predictor takes on 
numerical values as opposed to classification labels used by the RF classifier [38]. For growing a 
tree, RFR uses a combination of parameters or selected parameter (chosen randomly) at each 
node. The training data is generated by bagging which is a technique where the data were 
randomly drawn and replaced with from the original data reserved for training. The training data 
can also randomly be selected for constructing an individual tree for each of the feature 
combination [39]. In bagging, 70 % of the original data was used for the training and 30 % was 
left out from every tree grown. A pruning method as well as a variable selection procedure was 
required in order to design a tree predictor. To select the variable for the tree induction, a large 
number approaches were available in literature. Majority of the approaches such as information 
gain ratio and Gini index in literature [39-41] recommends assigning a quality measure directly 
to the variable. RFR used in this study uses the former approach for the selection of the variable 
measure. The Gini index approach determines the impurity of the variable with respect to the 
output. RFR permits the tree to grow to the maximum depth of the training data by utilizing 
combination of variables and the fully grown trees were not allowed to be pruned back. This 
results in giving an edge to the RFR over the MSP as reported by [39]. Further, selection of 
variable measure as well as pruning method affects the performance of the tree based algorithm 
as reported by [41-43]. It was also reported by [38] that the generalization error converges with 
the increase in the number of trees even without pruning the tree. Also the overfitting of the data 
is not a problem due to strong law of large numbers as reported by [41]. For the RFR, the first 
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user defined parameter required was numbers of trees to be developed (designated as k). The 
second parameter was the number of variables required to create a tree at each node (designated 
as m) as reported by [38]. Selected variables were searched through best split at each node. RFR 
thus contains k and m, which were defined by the user and can have any value. The output from 
the RFR was a numerical value and hence the mean square error can be obtained for the 
numerical predictor. RF predictor was formed by considering the average of the error over k 
number of trees. 


3.2. M5P model tree 


MSP model (a binary decision tree) uses a linear regression function having the ability to 
predict continuous numerical attributes at the terminal nodes (leaf) as reported by [40]. A divide 
and conquer technique was adopted to develop the tree-based models. Generation of model tree 
was done in two steps. A splitting criterion was adopted to make a decision tree in the first step. 
In the M5P model tree algorithm, the splitting criterion was based on the standard deviation of 
the class values. This standard deviation of the class values reaches at the node as a measure of 
the error at that node. Expected reduction during this error as a result of testing each of the 
attribute at that node was then calculated. The data in the child nodes has lesser standard 
deviation in comparison to the parent node due to the splitting process and thus considered more 
pure as reported by [40]. W5P picks up the one which maximizes the expected error reduction 
after examining the possible splits. Such division results into a large tree like structure leading to 
over fitting. The tree must be pruned back in order to avoid the over fitting and replacing the 
subspaces with the leaf of the tree. The second stage of the design of the model tree thus involves 
pruning and replacing the subspaces with linear regression function. The MS5P splits the 
parameter into subspaces and develops a linear regression model in each of them. More details 
about the MSP can be had from [40]. 


3.3. Artificial neural network 


Artificial neural network is regression model having the ability to predict the output of the non- 
linear input t in a precise manner. It has drawn inspiration from the functioning of human 
nervous system. In this study a feedforward back propagation algorithm has been used. A basic 
neural network was an inter connection of input, hidden and output layers where the weights and 
the bias have been generated between the input & hidden layer and between the hidden & output 
layer respectively. Initially, the input was selected which can be divided into training and testing 
data based on [21,24—28]. The training data was then used to train the neural network model and 
the iterations were fixed as per the procedure reported by [21,24—28]. The activation function 
used in the ANN was sigmoid function which was an inbuilt default function available in the 
open source Weka 3.8 software. Based on the literature [21] the sigmoid activation function has 
been proved to be the most accurate as it yields the minimum errors. Finally, the testing data was 
used to test the model. In order to check the accuracy of the predicted output with the actual 
output, the performance measures were calculated and discussed in the subsequent section. 
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4. Data set and performance measures 


The random forest regression (RFR), MSP model tree and artificial neural network (ANN) based 
soft computing models were developed using an wide range of data comprising 91 experimental 
record (model plate load tests) and 90 theoretical record (two-dimensional finite element model) 
collected from different studies reported in literature [4,36,37]. The total data (181 records) were 
divided into two parts. The first part comprises of 128 records for the training purpose. The 
remaining data 53 records were used for the testing purpose. The selection of the data for the 
training as well as testing purpose was done randomly. It is pertinent to mention here that the 
division of the total data for the training and the testing was made based on the rules reported by 
[23,34,35]. The various input parameters used for the modelling were friction angle of first layer 
sand (¢,), friction angle of second layer sand (¢), unit weight of first layer sand (), unit weight 
of second layer sand (7%), ratio of thickness of first layer sand below footing base to width of 
footing (H/B), ratio of depth of footing to width of footing (D/B) and (H+D)/B whereas the UBC 
was considered as an output. The range of each of the parameter considered in the study was 
given in Table 1. 


Table 1 
Range of the parameters used for modelling. 


Total data set 


Input parameters 
Min. Max. Avg. Standard deviation 


dy 43.00 47.70 44.97 249 
dy 30.00 42.40 36.68 3.87 
n(kNim’) 16.34. 20.00 ~—-18.82 1.43 
yo (KN/m’) 13.00 19.00 —:16.60 2.06 
(H+D)/B 0.50 15.00 4.37 3.95 
D/B 0.00 1.00 0.07 0.23 
H/B 0.50 15.00 4.30 3.98 
dur (KPa) 41.02 4082.60 1765.19 1204.61 


In order to check the prediction accuracy of the various soft computing techniques such as RFR, 
MSP and ANN, the various performance measures whose mathematical expressions tabulated in 
Table 2 were computed and compared. 


The primary performance measures considered were the coefficient of determination (R’) and the 
coefficient of correlation (r). The ‘R? and 7’ close to 1 indicating a best fit and 0 indicates a poor 
fit. The other performance measures such as RMSE, MAE, RAE and RRSE at the same time has 
to be minimum among the selected models for comparison. The lesser values of the RMSE, 
MAE, RAE and RRSE indicate the best model to predict the output. The calculated performance 
measures for the RFR, M5P and ANN were tabulated in Table 3. 
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Table 2 
Performance measures and their mathematical expressions. 
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Statistical coefficient Mathematical expression 
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Correlation coefficient (r) 


Coefficient of determination (R’) 


Root mean square error (RMSE) RMSE = | 1 ot 6 leant? Be } 
n ‘ p 
l., 
Mean absolute error (MAE) MAE = — |G — Vat 
n ‘ 7 
% Quit, = Qutt, 
Relative absolute error (RAE) RAE =| >.,|=~"———] | 100 
Dutt, ion Qutt, 
>(4u, = Qutt, ) 
Root relative square error (RRSE) RRSE = 7 


F Let 7 
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i= 


Note: 4,;, +4, , target and predicted UBC respectively, 4, >, , ‘mean of the target and predicted UBC 


respectively, S dun,’ S :standard deviation of the target and predicted UBC respectively, n : number of 


Quit py 
observations 
Table 3 
Performance measures using RFR, M5P and ANN for the training and the testing data. 
Training Testing 
Techniques 
R? r MAE RMSE RAE RRSE_ R? r MAE RMSE RAE RRSE 
Random forest 0.98 0.99 70.61 168.18 7.84 13.51 0.96 0.98 124.93 236.50 13.95 19.05 
MS model tree 0.47 0.91 512.52 645.96 56.88 51.88 0.31 0.85 551.69 708.90 61.60 57.12 
ANN 0.94 0.97 380.23 467.07 42.20 37.51 0.94 0.97 378.77 460.65 42.29 37.11 


It is pertinent to mention here that the selection of the optimal value of the user defined 
parameters affects the performance of the RFR, M5P model tree and ANN. The default user 
defined parameters in Weka software were initially used. The number of trials was carried out to 
find the optimal value of the user defined parameters by comparing the performance measures of 
each trial. Finally, the optimal user defined parameters were obtained and tabulated in the Table 4 


for the RFR, M5P and ANN. 
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Table 4 
Optimum values of user defined attributes for RFR, M5P and ANN. 


Classifiers used User defined parameters 


RFR k=2, m=2, I=100 


M5P M=5 


Learning rate = 0.2, 
momentum = 0.1, 
Iterations = 4000 
Hidden layers = 5 


ANN 


5. Results and discussions 


In order to compare the performance of the selected soft computing techniques for the prediction 
of the UBC of the strip footing resting on layered soil, performance measures such as R’, r, 
MAE, RMSE, MAE and RRSE were calculated and were tabulated in Table 3. The predicted and 
targeted UBC using RFR, M5P and ANN techniques for the training and the testing data were 


shown in Figs. 2-4 respectively. 
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Fig. 2. Variation of targeted with the predicted UBC of the footing resting on layered soil using RFR. 
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Fig. 3. Variation of targeted with the predicted UBC of the footing resting on layered soil using M5P. 
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Fig. 4. Variation of targeted with the predicted UBC of the footing resting on layered soil using ANN. 
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The study of the Figs. 2-4 and Table 3 reveals that the RFR shows a better performance in terms 
of all the performance measures considered in this study. The results of the performance 
measures of the RFR specifies that this technique can be used to accurately predict the UBC of 
the strip footing resting on layered soil. The order of prediction of the UBC of the strip footing 
resting on layered soil was RFR followed by ANN and the M5P technique. 


6. Sensitivity analysis 


Sensitivity analysis has been performed to study the major input parameter affecting the UBC of 
the strip footing resting on layered soil for the RFR technique. For this, different combination of 
the input parameters was used. For each of the combination, one of the input parameter was 
removed and the RFR was carried out in order to check the influence of this omitted input 
parameter on the output. Further, for each of the combination of the input parameters, the 
performance measures (R’, r, MAE, RMSE, MAE and RRSE) were calculated and tabulated in 
Table 5. 


Table 5 
Sensitivity analysis using RFR. 


Random forest regression 


Input combinations Input parameter removed 
R’ r MAE RMSE RAE_ RRSE 
hi, b, n. % H/B, D/B and (H+D)/B a 0.98 0.99 70.61 168.18 7.84 13.51 
*, N15 2, H/B, D/B and (H+D)/B py 0.98 0.99 69.98 166.86 7.77 13.40 
15 Ys 2, H/B, D/B and (H+D)/B by 0.96 0.97 77.53 177.33 8.60 14.24 
d1, ¢2, %, H/B, D/B and (H+D)/B nv 0.98 0.99 71.75 166.08 7.96 13.34 
#1, $2, 1, H/B, D/B and (H+D)/B Vy 0.95 0.96 91.13 18840 10.11 15.13 
1, 2 YN, % D/B and (H+D)/B H/B 0.98 0.99 74.65 172.75 8.28 13.87 
fi, 2, N, %, H/B, and (H+D)/B D/B 0.98 0.99 72.88 171.13 8.09 13.74 
d1, bo, Nn, 1, H/B, and D/B (H+D)/B 0.98 0.99 74.53 172.30 8.27 13.84 


The study of the Table 5 reveals that the unit weight of loose sand layer sand (1%) followed by 
friction angle of loose sand layer sand (¢) were having key influence in predicting the UBC of 
the strip footing resting on layered soil using a RFR in comparison to the other input parameters. 
While removing the other input parameters in each of the combination (except @ and 7») was not 
having a major influence on the prediction of UBC of the strip footing resting on layered soil 
using RFR. The results further suggested that the RFR provides the best performance with the 
data combination involved in the remaining input parameters. This was attributed to the fact that 
the loose sand layer properties were playing a major role in predicting the UBC of the strip 
footing resting on layered soil. 
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7. Conclusions 


This paper investigates the potential of RFR technique in predicting and identifying the useful 
parameters affecting the prediction of the UBC of strip footing resting on layered soil using the 
experimental and theoretical data reported in literature. Based on the results and discussion 
presented, the following conclusions are put forward. 


1. Random forest regression algorithm works well in predicting the ultimate bearing capacity 
of the strip footing resting on dense sand overlying loose sand deposit in comparison to the 
MSP and the artificial neural network. 

2. The order of prediction of the ultimate bearing capacity of the strip footing resting on 
dense sand overlying loose sand deposit accurately were RFR followed by ANN and the 
MSP technique. 

3. Random forest regression algorithm can effectively be used to identify the useful input 
parameters affecting ultimate bearing capacity of the strip footing resting on dense sand 
overlying loose sand deposit. 

4. The unit weight and the friction angle of the loose sand layer were playing a major role in 
predicting the ultimate bearing capacity of strip footing resting on dense sand overlying 
loose sand deposit. 


Notations 
ANN Artificial Neural Network 
RFR Random forest regression 
A Friction angle of the dense sand layer 
bp Friction angle of the loose sand layer 
nv Unit weight of the dense sand layer 
% Unit weight of the loose sand layer 
H Thickness of the dense sand layer 
B Width of footing 
D Depth of the footing 
ult Ultimate bearing capacity 
RMSE Root Mean Square Error 
MAE Mean Absolute Error 
MAPE Mean Absolute Percentage Error 
RAE Root mean square error 
RRSE Root relative square error 
R’ Coefficient Of Determination 
r Correlation Coefficient 
N Set of the data records that reach the node 
Nj Sets resulted from splitting the node according to a given attribute 
sd standard deviation 
F’ Predicted value passed on to the following higher node 
c Predicted passed to the current node from lower node 
b Estimated value using the technique at this node 
1 Number of training examples that reach the node below 
J Constant 
k Numbers of trees developed 
m number of variables required to create a tree at each node 
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